Tool Use from Python
Before reading any explanation, predict what happens when you run this code:
import anthropic
client = anthropic.Anthropic()
tools = [
{
"name": "get_weather",
"description": "Get the current weather for a city.",
"input_schema": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"},
},
"required": ["city"],
},
}
]
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=1024,
tools=tools,
messages=[{"role": "user", "content": "What is the weather in Paris right now?"}],
)
print(response.stop_reason)
print(type(response.content[0]))
Write your prediction. Then continue.
# Output:
# tool_use
# <class 'anthropic.types.tool_use_block.ToolUseBlock'>
The model did not answer the question. It stopped and asked Python to run a tool. stop_reason is "tool_use", not "end_turn". The content block is not text -- it is a structured tool call. The model's response is a request, not an answer.
This is the fundamental shift in agentic AI engineering: you are no longer calling an API and reading a response. You are running a loop. The model asks, your code executes, you report back, the model continues. Your Python code is the execution environment for the model's plan.
This lesson teaches you to build that loop correctly.
What You Will Learn
- The tool use protocol: how the model requests a tool call and how you respond
- Defining tools with JSON schemas, Pydantic models, and decorated Python functions
- The agentic loop: structure, stopping conditions, and safety limits
- Anthropic tool use API:
tool_choice,ToolResultBlock, multi-turn with tools - OpenAI function calling: parallel calls,
tool_choice,requiredvsauto - Building a
ToolRegistryclass that auto-generates schemas from type hints and docstrings - Error handling: what to send back when a tool raises an exception
- Timeout and iteration limits to prevent runaway agents
- Parallel tool execution when the model requests multiple tools simultaneously
- Real tool implementations: web search, code execution, database queries, file I/O
- Testing tool-using agents
Prerequisites
- Familiarity with the Anthropic and OpenAI Python SDKs (Lessons 1-2)
- Python type hints and
inspectmodule basics - Dataclasses and
TypedDict
Part 1 -- How Tool Use Works
The flow is a multi-turn protocol, not a single API call:
The model never executes tools directly. It generates a structured request saying "please run tool X with arguments Y and tell me what you get." Your Python code runs the tool, then sends the result back in the next API call. The model then generates either the final answer or another tool call.
Why This Design?
The model has no internet access, no filesystem access, no ability to run code. It is a text-in, text-out function. Tool use is the mechanism by which it extends its capabilities into the real world -- through your code as the intermediary. You control which tools exist, what they can do, and what information flows back to the model. This is both the power and the responsibility of tool use engineering.
Part 2 -- Defining Tools
Raw JSON Schema
The most explicit form -- useful when you need precise control or are generating schemas programmatically:
# Tools are defined as a list of dicts matching the JSON Schema spec
# The 'input_schema' field describes the arguments the tool expects
weather_tool = {
"name": "get_weather",
"description": (
"Retrieve current weather conditions for a specified city. "
"Returns temperature in Celsius, a weather description, and humidity. "
"Use this when the user asks about current weather."
),
"input_schema": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name, e.g. 'London' or 'New York'",
},
"units": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature units. Defaults to celsius.",
},
},
"required": ["city"], # 'units' is optional -- omitted from required list
},
}
search_tool = {
"name": "web_search",
"description": (
"Search the web for current information. Use when the answer requires "
"recent information not in your training data. Returns a list of search "
"result snippets with titles and URLs."
),
"input_schema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query string.",
},
"num_results": {
"type": "integer",
"description": "Number of results to return. Default 5, max 10.",
"minimum": 1,
"maximum": 10,
},
},
"required": ["query"],
},
}
Pydantic-Based Tool Schemas
Pydantic models generate JSON schemas automatically, eliminating the need to write them by hand. This is the recommended approach for production systems:
from pydantic import BaseModel, Field
import json
class GetWeatherInput(BaseModel):
"""Input schema for the get_weather tool."""
city: str = Field(description="City name, e.g. 'London' or 'New York'")
units: str = Field(
default="celsius",
description="Temperature units: 'celsius' or 'fahrenheit'",
pattern="^(celsius|fahrenheit)$",
)
class WebSearchInput(BaseModel):
query: str = Field(description="The search query string.")
num_results: int = Field(
default=5,
ge=1,
le=10,
description="Number of results to return. Default 5, max 10.",
)
def pydantic_to_anthropic_tool(
name: str,
description: str,
model: type[BaseModel],
) -> dict:
"""Convert a Pydantic model into an Anthropic tool definition.
The Pydantic model provides the input_schema automatically.
Field descriptions populate the JSON schema 'description' fields.
"""
schema = model.model_json_schema()
# Remove Pydantic-specific fields that Anthropic does not expect
schema.pop("title", None)
return {
"name": name,
"description": description,
"input_schema": schema,
}
# Generate the tool definitions
weather_tool = pydantic_to_anthropic_tool(
"get_weather",
"Get current weather conditions for a city.",
GetWeatherInput,
)
# Verify the generated schema
print(json.dumps(weather_tool, indent=2))
Part 3 -- The Agentic Loop
The agentic loop is the core pattern for tool-using applications. It runs until the model produces a final answer or a safety limit is hit.
import anthropic
from anthropic.types import ToolUseBlock, TextBlock, Message
client = anthropic.Anthropic()
def run_agent(
user_message: str,
tools: list[dict],
tool_implementations: dict[str, callable],
system_prompt: str = "",
model: str = "claude-opus-4-6",
max_iterations: int = 10,
) -> str:
"""Run an LLM agent with tool use until it produces a final answer.
Args:
user_message: The user's initial request.
tools: List of tool definitions (Anthropic format).
tool_implementations: Dict mapping tool name to Python callable.
system_prompt: Optional system-level instructions.
model: The model to use.
max_iterations: Safety limit on the number of tool call rounds.
Returns:
The model's final text response.
Raises:
MaxIterationsError: If max_iterations is reached without a final answer.
"""
messages = [{"role": "user", "content": user_message}]
for iteration in range(max_iterations):
# Call the model
response = client.messages.create(
model=model,
max_tokens=4_096,
system=system_prompt,
tools=tools,
messages=messages,
)
# If the model is done, extract and return the final text
if response.stop_reason == "end_turn":
# Find the last TextBlock in the response
for block in response.content:
if isinstance(block, TextBlock):
return block.text
return "" # No text block (shouldn't happen, but be defensive)
# The model wants to use tools
if response.stop_reason == "tool_use":
# Append the model's response (including tool call requests) to history
# IMPORTANT: you must include the model's tool_use blocks in the history
# before adding tool results, or the API will reject the next call
messages.append({"role": "assistant", "content": response.content})
# Execute each tool the model requested
tool_results = []
for block in response.content:
if not isinstance(block, ToolUseBlock):
continue
result = execute_tool_safely(
tool_name=block.name,
tool_input=block.input,
implementations=tool_implementations,
)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id, # Must match the ToolUseBlock's id
"content": result["content"],
"is_error": result["is_error"],
})
# Send tool results back to the model
messages.append({"role": "user", "content": tool_results})
continue
# Unexpected stop reason -- treat as done
break
raise MaxIterationsError(
f"Agent did not produce a final answer within {max_iterations} iterations. "
f"Last stop_reason: {response.stop_reason}"
)
class MaxIterationsError(RuntimeError):
"""Raised when an agent loop exceeds the iteration limit."""
pass
Safe Tool Execution
import traceback
from typing import Any
def execute_tool_safely(
tool_name: str,
tool_input: dict,
implementations: dict[str, callable],
) -> dict[str, Any]:
"""Execute a tool and return a result dict that is always safe to send back.
Key design principle: the tool execution layer NEVER crashes the agent loop.
If a tool fails, we tell the model what went wrong and let it decide how
to proceed (retry, use a different tool, report the error to the user).
Returns:
{
"content": str | list, # The result text, or error description
"is_error": bool, # True if the tool raised an exception
}
"""
if tool_name not in implementations:
return {
"content": (
f"Tool '{tool_name}' is not available. "
f"Available tools: {list(implementations.keys())}"
),
"is_error": True,
}
try:
result = implementations[tool_name](**tool_input)
# Normalise result to string if it is not already
if isinstance(result, str):
content = result
elif isinstance(result, (dict, list)):
content = json.dumps(result, indent=2, default=str)
else:
content = str(result)
return {"content": content, "is_error": False}
except TypeError as e:
# Wrong arguments -- this usually means the schema is wrong
return {
"content": (
f"Tool '{tool_name}' received unexpected arguments: {e}. "
f"Arguments provided: {tool_input}"
),
"is_error": True,
}
except Exception as e:
# Any other failure -- give the model enough context to react
return {
"content": (
f"Tool '{tool_name}' raised {type(e).__name__}: {e}\n"
f"Arguments: {tool_input}"
),
"is_error": True,
}
Part 4 -- Anthropic Tool Use: Full Protocol
import anthropic
from anthropic.types import (
ToolUseBlock,
TextBlock,
ToolResultBlockParam,
)
client = anthropic.Anthropic()
def demonstrate_anthropic_tool_protocol() -> None:
"""Show the raw multi-turn Anthropic tool use protocol step by step."""
# ----- Turn 1: Model requests a tool -----
response1 = client.messages.create(
model="claude-opus-4-6",
max_tokens=1_024,
tools=[weather_tool],
# tool_choice controls whether and how the model uses tools:
# {"type": "auto"} -- model decides (default)
# {"type": "any"} -- model must use at least one tool
# {"type": "tool", "name": "get_weather"} -- must use this specific tool
tool_choice={"type": "auto"},
messages=[
{"role": "user", "content": "What is the weather in Tokyo right now?"}
],
)
print(f"Stop reason: {response1.stop_reason}") # tool_use
tool_block = response1.content[0]
assert isinstance(tool_block, ToolUseBlock)
print(f"Tool requested: {tool_block.name}") # get_weather
print(f"Tool ID: {tool_block.id}") # toolu_01Abc...
print(f"Tool input: {tool_block.input}") # {'city': 'Tokyo', 'units': 'celsius'}
# ----- Execute the tool in Python -----
tool_result_text = json.dumps({
"city": "Tokyo",
"temperature_c": 22,
"description": "Partly cloudy",
"humidity_pct": 68,
})
# ----- Turn 2: Send tool result back -----
# The history must include:
# 1. The original user message
# 2. The model's response including the ToolUseBlock (response1.content)
# 3. A new user message containing the ToolResultBlockParam
response2 = client.messages.create(
model="claude-opus-4-6",
max_tokens=1_024,
tools=[weather_tool],
messages=[
{"role": "user", "content": "What is the weather in Tokyo right now?"},
{"role": "assistant", "content": response1.content}, # Include tool call
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": tool_block.id, # Must match the tool call ID
"content": tool_result_text,
# "is_error": True # Set this if the tool failed
}
],
},
],
)
print(f"\nStop reason: {response2.stop_reason}") # end_turn
print(f"Final answer: {response2.content[0].text}")
# "The current weather in Tokyo is 22 degrees Celsius with partly cloudy skies..."
Handling Multiple Tool Calls in One Turn
The model may request multiple tools in a single response. Process all of them before sending results back:
def handle_multi_tool_response(
response: anthropic.types.Message,
implementations: dict[str, callable],
) -> list[dict]:
"""Process all tool calls in a model response.
The model may request 1 or more tools in a single turn.
Always collect ALL results before sending back -- never send
partial results from some tools while others are pending.
Returns:
List of ToolResultBlockParam dicts ready to send as the next user message.
"""
tool_results = []
for block in response.content:
if not isinstance(block, ToolUseBlock):
continue # Skip TextBlocks (model may include explanation text)
result = execute_tool_safely(
tool_name=block.name,
tool_input=block.input,
implementations=implementations,
)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result["content"],
"is_error": result["is_error"],
})
return tool_results
Part 5 -- OpenAI Function Calling
OpenAI uses slightly different terminology but the same protocol structure:
import openai
import json
client_oai = openai.OpenAI()
def openai_tool_definition() -> dict:
"""OpenAI uses 'function' type with 'function' sub-object."""
return {
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city.",
"parameters": { # 'parameters' not 'input_schema'
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name",
},
"units": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature units",
},
},
"required": ["city"],
"additionalProperties": False, # Strict mode recommended
},
"strict": True, # Structured outputs -- model follows schema exactly
},
}
def run_openai_agent(
user_message: str,
tools: list[dict],
tool_implementations: dict[str, callable],
model: str = "gpt-4o",
max_iterations: int = 10,
) -> str:
"""OpenAI agentic loop with function calling."""
messages = [{"role": "user", "content": user_message}]
for iteration in range(max_iterations):
response = client_oai.chat.completions.create(
model=model,
messages=messages,
tools=tools,
# tool_choice options:
# "auto" -- model decides (default)
# "required" -- model must call at least one tool
# "none" -- model must not call any tools
# {"type": "function", "function": {"name": "..."}} -- specific tool
tool_choice="auto",
parallel_tool_calls=True, # Allow multiple simultaneous tool calls
)
choice = response.choices[0]
messages.append(choice.message.model_dump()) # Add assistant response to history
if choice.finish_reason == "stop":
return choice.message.content or ""
if choice.finish_reason == "tool_calls":
# Process all tool calls (may be parallel if parallel_tool_calls=True)
for tool_call in choice.message.tool_calls:
try:
arguments = json.loads(tool_call.function.arguments)
except json.JSONDecodeError as e:
result_text = f"Failed to parse tool arguments: {e}"
is_error = True
else:
result = execute_tool_safely(
tool_name=tool_call.function.name,
tool_input=arguments,
implementations=tool_implementations,
)
result_text = result["content"]
is_error = result["is_error"]
# OpenAI tool results go as individual 'tool' role messages
messages.append({
"role": "tool",
"tool_call_id": tool_call.id, # Must match the tool_call's id
"content": result_text,
# Note: OpenAI does not have an explicit is_error field --
# prefix the error message with "ERROR:" as a convention
# if is_error: result_text = f"ERROR: {result_text}"
})
continue
break # Unexpected finish_reason
raise MaxIterationsError(
f"OpenAI agent exceeded {max_iterations} iterations."
)
Parallel Tool Calls
When parallel_tool_calls=True and the model requests multiple tools simultaneously, you should execute them concurrently:
import asyncio
import openai
client_oai_async = openai.AsyncOpenAI()
async def execute_tools_parallel(
tool_calls: list,
implementations: dict[str, callable],
) -> list[dict]:
"""Execute multiple tool calls in parallel using asyncio.
When the model requests 3 tools at once, there is no reason to run them
sequentially. Execute all of them concurrently and return all results.
Total time = max(individual_tool_times), not sum(individual_tool_times).
"""
async def run_one(tool_call) -> dict:
try:
arguments = json.loads(tool_call.function.arguments)
except json.JSONDecodeError as e:
return {
"role": "tool",
"tool_call_id": tool_call.id,
"content": f"Failed to parse arguments: {e}",
}
# If the tool implementation is async, await it; otherwise run in executor
impl = implementations.get(tool_call.function.name)
if impl is None:
content = f"Unknown tool: {tool_call.function.name}"
elif asyncio.iscoroutinefunction(impl):
try:
result = await impl(**arguments)
content = json.dumps(result) if not isinstance(result, str) else result
except Exception as e:
content = f"Tool error: {type(e).__name__}: {e}"
else:
# Run synchronous tools in a thread pool to avoid blocking the event loop
loop = asyncio.get_event_loop()
try:
result = await loop.run_in_executor(None, lambda: impl(**arguments))
content = json.dumps(result) if not isinstance(result, str) else result
except Exception as e:
content = f"Tool error: {type(e).__name__}: {e}"
return {
"role": "tool",
"tool_call_id": tool_call.id,
"content": content,
}
return await asyncio.gather(*[run_one(tc) for tc in tool_calls])
Part 6 -- Building a ToolRegistry
Rather than maintaining tool schemas and implementations separately (and keeping them in sync), build a registry that derives the schema automatically from the Python function:
import inspect
import json
import re
from typing import Any, Callable, get_type_hints
from functools import wraps
class ToolRegistry:
"""Auto-registers Python functions as LLM tools.
Derives tool schemas from:
- Function name (used as tool name)
- Docstring first paragraph (used as tool description)
- Type hints (mapped to JSON Schema types)
- Default values (determines if parameter is required)
- :param name: doc lines in docstring (parameter descriptions)
This means you write normal Python functions with proper docstrings and
the registry handles all the JSON Schema boilerplate. The schema stays
in sync with the implementation automatically.
"""
# Python type -> JSON Schema type mapping
_TYPE_MAP: dict[type, str] = {
str: "string",
int: "integer",
float: "number",
bool: "boolean",
list: "array",
dict: "object",
}
def __init__(self):
self._tools: dict[str, dict] = {} # name -> Anthropic tool definition
self._implementations: dict[str, Callable] = {} # name -> Python function
def tool(
self,
name: str | None = None,
description: str | None = None,
) -> Callable:
"""Decorator to register a function as an LLM tool.
Usage:
@registry.tool()
def get_weather(city: str, units: str = "celsius") -> str:
'''Get current weather for a city.
:param city: City name, e.g. 'London'
:param units: Temperature units: 'celsius' or 'fahrenheit'
'''
...
"""
def decorator(func: Callable) -> Callable:
tool_name = name or func.__name__
tool_desc = description or self._extract_description(func)
schema = self._build_schema(func)
self._tools[tool_name] = {
"name": tool_name,
"description": tool_desc,
"input_schema": schema,
}
self._implementations[tool_name] = func
@wraps(func)
def wrapper(*args, **kwargs):
return func(*args, **kwargs)
wrapper._tool_name = tool_name
return wrapper
return decorator
def _extract_description(self, func: Callable) -> str:
"""Extract the first paragraph of the docstring as the description."""
doc = inspect.getdoc(func) or ""
# First paragraph is everything before the first blank line or :param
lines = []
for line in doc.splitlines():
if line.startswith(":param") or line.startswith(":return"):
break
if not line and lines:
break
if line:
lines.append(line)
return " ".join(lines) if lines else f"Execute the {func.__name__} operation."
def _extract_param_docs(self, func: Callable) -> dict[str, str]:
"""Extract :param name: description lines from the docstring."""
doc = inspect.getdoc(func) or ""
param_docs: dict[str, str] = {}
for match in re.finditer(r":param\s+(\w+):\s*(.+)", doc):
param_docs[match.group(1)] = match.group(2).strip()
return param_docs
def _build_schema(self, func: Callable) -> dict:
"""Build a JSON Schema input_schema from the function signature."""
sig = inspect.signature(func)
hints = get_type_hints(func)
param_docs = self._extract_param_docs(func)
properties: dict[str, dict] = {}
required: list[str] = []
for param_name, param in sig.parameters.items():
if param_name in ("self", "cls"):
continue
# Determine JSON Schema type from type hint
hint = hints.get(param_name, str)
# Unwrap Optional[X] to X
origin = getattr(hint, "__origin__", None)
if origin is type(None):
hint = str # Fallback
elif hasattr(hint, "__args__"):
# Union[X, None] = Optional[X]
args = [a for a in hint.__args__ if a is not type(None)]
hint = args[0] if args else str
json_type = self._TYPE_MAP.get(hint, "string")
prop: dict[str, Any] = {"type": json_type}
if param_name in param_docs:
prop["description"] = param_docs[param_name]
properties[param_name] = prop
# If no default value, the parameter is required
if param.default is inspect.Parameter.empty:
required.append(param_name)
return {
"type": "object",
"properties": properties,
"required": required,
}
def get_tools(self) -> list[dict]:
"""Return the list of tool definitions for the Anthropic API."""
return list(self._tools.values())
def get_openai_tools(self) -> list[dict]:
"""Return tool definitions in OpenAI format."""
openai_tools = []
for tool in self._tools.values():
openai_tools.append({
"type": "function",
"function": {
"name": tool["name"],
"description": tool["description"],
"parameters": tool["input_schema"],
},
})
return openai_tools
def get_implementations(self) -> dict[str, Callable]:
"""Return the tool implementation dict for the agent loop."""
return dict(self._implementations)
def __contains__(self, name: str) -> bool:
return name in self._tools
def __repr__(self) -> str:
return f"ToolRegistry({list(self._tools.keys())})"
# Example: registering tools with the registry
registry = ToolRegistry()
@registry.tool()
def get_weather(city: str, units: str = "celsius") -> str:
"""Get current weather conditions for a city.
:param city: City name, e.g. 'London' or 'New York'
:param units: Temperature units: 'celsius' or 'fahrenheit'
"""
# In production: call a real weather API
weather_data = {
"london": {"temp_c": 15, "description": "Overcast", "humidity": 82},
"new york": {"temp_c": 22, "description": "Sunny", "humidity": 45},
"tokyo": {"temp_c": 28, "description": "Partly cloudy", "humidity": 70},
}
data = weather_data.get(city.lower(), {"temp_c": 20, "description": "Unknown", "humidity": 50})
temp = data["temp_c"]
if units == "fahrenheit":
temp = round(temp * 9/5 + 32, 1)
unit_str = "F"
else:
unit_str = "C"
return json.dumps({
"city": city,
"temperature": f"{temp}{unit_str}",
"description": data["description"],
"humidity_pct": data["humidity"],
})
@registry.tool()
def web_search(query: str, num_results: int = 5) -> str:
"""Search the web for current information.
:param query: Search query string.
:param num_results: Number of results to return (1-10).
"""
# In production: call a real search API (Brave, Bing, SerpAPI, etc.)
# This is a stub that returns plausible fake results for testing
return json.dumps([
{
"title": f"Result {i+1} for: {query}",
"snippet": f"This is result {i+1}. Relevant content about {query}.",
"url": f"https://example.com/result/{i+1}",
}
for i in range(min(num_results, 5))
])
# Inspect the auto-generated schema
print(json.dumps(registry.get_tools()[0], indent=2))
Part 7 -- Real Tool Implementations
Code Execution Tool
import subprocess
import tempfile
import os
from pathlib import Path
@registry.tool()
def execute_python(code: str, timeout_seconds: int = 10) -> str:
"""Execute Python code in a sandboxed subprocess and return the output.
Use this to run calculations, data transformations, or verify logic.
Do not use this to run code that has side effects on production systems.
:param code: Python code to execute.
:param timeout_seconds: Maximum execution time in seconds (max 30).
"""
timeout_seconds = min(timeout_seconds, 30) # Hard cap -- model cannot override
# Write code to a temp file rather than passing it via -c to avoid
# shell injection and command-line length limits
with tempfile.NamedTemporaryFile(mode="w", suffix=".py", delete=False) as f:
f.write(code)
tmp_path = f.name
try:
result = subprocess.run(
["python3", tmp_path],
capture_output=True,
text=True,
timeout=timeout_seconds,
# Restrict environment: no access to production env vars
env={
"PATH": "/usr/bin:/bin",
"HOME": "/tmp",
"PYTHONPATH": "",
},
)
output_parts = []
if result.stdout.strip():
output_parts.append(f"stdout:\n{result.stdout.strip()}")
if result.stderr.strip():
output_parts.append(f"stderr:\n{result.stderr.strip()}")
if result.returncode != 0:
output_parts.append(f"exit_code: {result.returncode}")
return "\n\n".join(output_parts) or "(no output)"
except subprocess.TimeoutExpired:
return f"Execution timed out after {timeout_seconds} seconds."
finally:
os.unlink(tmp_path) # Always clean up the temp file
Database Query Tool
import sqlite3
from typing import Any
class DatabaseQueryTool:
"""Wraps a database connection for use as an LLM tool.
Demonstrates the pattern of tools that hold state (a connection pool
or connection object) while exposing a stateless interface to the model.
"""
def __init__(self, db_path: str, read_only: bool = True):
"""
Args:
db_path: Path to the SQLite database file.
read_only: If True, only SELECT queries are allowed.
Set False only if you trust the model completely.
"""
self._db_path = db_path
self._read_only = read_only
def query(self, sql: str, limit: int = 50) -> str:
"""Execute a SQL query and return results as JSON.
:param sql: SQL query to execute. Only SELECT queries are allowed.
:param limit: Maximum number of rows to return (max 100).
"""
limit = min(limit, 100) # Hard cap
# Reject non-SELECT queries if read_only mode is enabled
if self._read_only:
sql_stripped = sql.strip().upper()
if not sql_stripped.startswith("SELECT"):
return json.dumps({
"error": "Only SELECT queries are permitted in read-only mode.",
"rejected_query": sql[:100],
})
try:
conn = sqlite3.connect(self._db_path)
conn.row_factory = sqlite3.Row # Rows act like dicts
cursor = conn.execute(f"{sql} LIMIT {limit}")
columns = [desc[0] for desc in cursor.description]
rows = [dict(zip(columns, row)) for row in cursor.fetchall()]
return json.dumps({
"columns": columns,
"row_count": len(rows),
"rows": rows,
}, default=str)
except sqlite3.Error as e:
return json.dumps({"error": f"SQL error: {e}", "query": sql[:200]})
finally:
conn.close()
# Register the database tool on an instance
db_tool = DatabaseQueryTool("/data/analytics.db", read_only=True)
# Manually add to registry (for instance methods that cannot be decorated directly)
registry._implementations["query_database"] = db_tool.query
registry._tools["query_database"] = {
"name": "query_database",
"description": (
"Execute a read-only SQL SELECT query against the analytics database. "
"Use this to retrieve data about users, events, and metrics."
),
"input_schema": {
"type": "object",
"properties": {
"sql": {"type": "string", "description": "SQL SELECT query to execute."},
"limit": {
"type": "integer",
"description": "Max rows to return. Default 50, max 100.",
"minimum": 1,
"maximum": 100,
},
},
"required": ["sql"],
},
}
File Operations Tool
from pathlib import Path
class FileReadTool:
"""Read-only file access tool for LLM agents.
Restricts access to a whitelist of allowed directories to prevent
the model from reading sensitive system files.
"""
def __init__(self, allowed_dirs: list[str]):
self._allowed = [Path(d).resolve() for d in allowed_dirs]
def read_file(self, path: str) -> str:
"""Read the contents of a text file.
:param path: File path relative to the allowed directory.
"""
resolved = Path(path).resolve()
# Security check: ensure the path is within an allowed directory
if not any(
str(resolved).startswith(str(allowed_dir))
for allowed_dir in self._allowed
):
return json.dumps({
"error": f"Access denied: {path} is outside allowed directories.",
"allowed_directories": [str(d) for d in self._allowed],
})
if not resolved.exists():
return json.dumps({"error": f"File not found: {path}"})
if not resolved.is_file():
return json.dumps({"error": f"Path is not a file: {path}"})
try:
content = resolved.read_text(encoding="utf-8")
# Limit content size to avoid filling the context window
if len(content) > 50_000:
content = content[:50_000] + "\n\n... [file truncated at 50,000 chars]"
return json.dumps({"path": str(resolved), "content": content})
except UnicodeDecodeError:
return json.dumps({
"error": f"Cannot read {path}: file is not valid UTF-8 text."
})
Part 8 -- Timeout and Safety Limits
Production agents need hard limits. Models can enter loops, request tools that are slow, or make more calls than the user is willing to pay for.
import asyncio
import time
from dataclasses import dataclass, field
@dataclass
class AgentLimits:
"""Safety limits for an agentic session."""
max_iterations: int = 10 # Max tool call rounds
max_wall_time_seconds: float = 60.0 # Max total elapsed time
max_tool_calls: int = 25 # Max total individual tool invocations
max_input_tokens: int = 100_000 # Max cumulative input tokens
def __post_init__(self):
self._start_time = time.monotonic()
self._tool_call_count = 0
self._total_input_tokens = 0
@property
def elapsed(self) -> float:
return time.monotonic() - self._start_time
def check(self, iteration: int, new_tool_calls: int = 0, new_tokens: int = 0) -> None:
"""Check all limits. Raises AgentLimitError if any limit is exceeded."""
self._tool_call_count += new_tool_calls
self._total_input_tokens += new_tokens
if iteration >= self.max_iterations:
raise AgentLimitError(
f"Exceeded max iterations ({self.max_iterations}). "
f"This usually means the agent is stuck in a loop. "
f"Check your tool descriptions -- are they clear enough?"
)
if self.elapsed > self.max_wall_time_seconds:
raise AgentLimitError(
f"Exceeded max wall time ({self.max_wall_time_seconds}s). "
f"Elapsed: {self.elapsed:.1f}s. "
f"Consider adding async timeouts to slow tools."
)
if self._tool_call_count > self.max_tool_calls:
raise AgentLimitError(
f"Exceeded max tool calls ({self.max_tool_calls}). "
f"Total so far: {self._tool_call_count}."
)
if self._total_input_tokens > self.max_input_tokens:
raise AgentLimitError(
f"Exceeded max input tokens ({self.max_input_tokens:,}). "
f"Total so far: {self._total_input_tokens:,}."
)
class AgentLimitError(RuntimeError):
"""Raised when an agent exceeds a configured safety limit."""
pass
def run_agent_with_limits(
user_message: str,
tools: list[dict],
implementations: dict[str, callable],
limits: AgentLimits | None = None,
model: str = "claude-opus-4-6",
) -> str:
"""Run the agent loop with safety limits enforced on every iteration."""
limits = limits or AgentLimits()
client = anthropic.Anthropic()
messages = [{"role": "user", "content": user_message}]
iteration = 0
while True:
response = client.messages.create(
model=model,
max_tokens=4_096,
tools=tools,
messages=messages,
)
# Count how many tool calls are in this response
tool_calls_this_round = sum(
1 for block in response.content
if isinstance(block, ToolUseBlock)
)
# Check all limits before proceeding
limits.check(
iteration=iteration,
new_tool_calls=tool_calls_this_round,
new_tokens=response.usage.input_tokens,
)
if response.stop_reason == "end_turn":
for block in response.content:
if isinstance(block, TextBlock):
return block.text
return ""
if response.stop_reason == "tool_use":
messages.append({"role": "assistant", "content": response.content})
tool_results = handle_multi_tool_response(response, implementations)
messages.append({"role": "user", "content": tool_results})
iteration += 1
continue
break
return ""
Part 9 -- Testing Tool-Using Agents
Testing agents is different from testing regular functions. The non-determinism of LLM outputs means you test behaviour at the level of: "does the agent use the right tools?", "does it recover from tool errors?", and "does it produce a final answer?".
import pytest
from unittest.mock import MagicMock, patch
class MockToolRegistry:
"""A tool registry with controllable tool implementations for testing."""
def __init__(self):
self._calls: list[dict] = []
self._responses: dict[str, str] = {}
def set_response(self, tool_name: str, response: str) -> None:
"""Pre-configure what a tool will return."""
self._responses[tool_name] = response
def set_error(self, tool_name: str, error: str) -> None:
"""Pre-configure a tool to fail with an error message."""
self._responses[tool_name] = f"ERROR:{error}"
def get_implementation(self, tool_name: str) -> callable:
"""Return a mock implementation that records calls and returns pre-configured responses."""
registry = self
def mock_tool(**kwargs) -> str:
registry._calls.append({"tool": tool_name, "args": kwargs})
response = registry._responses.get(tool_name, '{"result": "default mock response"}')
if response.startswith("ERROR:"):
raise RuntimeError(response[6:])
return response
return mock_tool
@property
def calls(self) -> list[dict]:
return list(self._calls)
def was_called(self, tool_name: str) -> bool:
return any(c["tool"] == tool_name for c in self._calls)
def call_count(self, tool_name: str) -> int:
return sum(1 for c in self._calls if c["tool"] == tool_name)
def test_agent_calls_weather_tool():
"""Agent should call the weather tool when asked about weather."""
mock_registry = MockToolRegistry()
mock_registry.set_response(
"get_weather",
'{"city": "London", "temperature": "15C", "description": "Cloudy"}'
)
implementations = {
"get_weather": mock_registry.get_implementation("get_weather"),
}
# NOTE: in unit tests, mock the LLM response too.
# Only do real API calls in integration tests.
# Here we show the structure -- in practice use pytest-mock or
# a real API call with test credentials.
result = run_agent_with_limits(
user_message="What is the weather in London?",
tools=registry.get_tools(),
implementations=implementations,
)
assert mock_registry.was_called("get_weather"), "Agent should have called get_weather"
assert mock_registry.call_count("get_weather") == 1, "Should only call once"
assert "london" in result.lower() or "15" in result, "Response should mention London or temperature"
def test_agent_recovers_from_tool_error():
"""Agent should produce a useful response even when a tool fails."""
mock_registry = MockToolRegistry()
mock_registry.set_error(
"get_weather",
"API rate limit exceeded. Retry after 60 seconds."
)
implementations = {
"get_weather": mock_registry.get_implementation("get_weather"),
}
# The agent should not crash. It should tell the user the tool failed.
result = run_agent_with_limits(
user_message="What is the weather in London?",
tools=registry.get_tools(),
implementations=implementations,
)
assert result, "Agent should produce some response even after tool failure"
# The model should have received the error and reported it to the user
def test_agent_respects_iteration_limit():
"""Agent should raise AgentLimitError if stuck in a loop."""
# A tool that always returns a result that makes the model call it again
call_count = [0]
def loopy_tool(**kwargs) -> str:
call_count[0] += 1
return '{"status": "incomplete", "call_again": true}'
with pytest.raises(AgentLimitError):
run_agent_with_limits(
user_message="Keep using the search tool until you find the answer.",
tools=[{
"name": "loopy_tool",
"description": "A tool that always says to call it again.",
"input_schema": {"type": "object", "properties": {}, "required": []},
}],
implementations={"loopy_tool": loopy_tool},
limits=AgentLimits(max_iterations=3),
)
Part 10 -- Complete Working Example
Putting it all together: a research agent that can search the web, read files, and execute code.
import anthropic
import json
def build_research_agent() -> tuple[list[dict], dict[str, callable]]:
"""Build a research agent with web search and code execution capabilities."""
agent_registry = ToolRegistry()
@agent_registry.tool()
def search_web(query: str, num_results: int = 5) -> str:
"""Search the web for current information and news.
Use this when you need recent information not in your training data,
or when the user asks about current events, prices, or status.
:param query: Search query. Be specific for better results.
:param num_results: Number of search results to return (1-10).
"""
# Production: use Brave Search API, Bing, or SerpAPI
# Stub for demonstration
return json.dumps({
"query": query,
"results": [
{
"title": f"Article about {query}",
"snippet": f"Relevant information about {query} from a credible source.",
"url": "https://example.com/article",
"published": "2026-03-01",
}
]
})
@agent_registry.tool()
def run_calculation(code: str) -> str:
"""Execute Python code for calculations and data analysis.
Use this for mathematical calculations, data processing, or
any computation that is easier to do with code than describe in words.
:param code: Python code to execute. Must print the result to stdout.
"""
# Use the execute_python tool from earlier
return execute_python(code, timeout_seconds=10)
@agent_registry.tool()
def extract_numbers(text: str) -> str:
"""Extract all numbers from a text string.
Use this to parse numeric values from web search results or
other text when you need to perform calculations.
:param text: Text to extract numbers from.
"""
import re
numbers = re.findall(r"-?\d+(?:\.\d+)?", text)
return json.dumps({
"numbers_found": [float(n) for n in numbers],
"count": len(numbers),
})
return agent_registry.get_tools(), agent_registry.get_implementations()
def run_research_session(question: str) -> str:
"""Run a research session to answer a complex question."""
tools, implementations = build_research_agent()
result = run_agent_with_limits(
user_message=question,
tools=tools,
implementations=implementations,
limits=AgentLimits(
max_iterations=8,
max_wall_time_seconds=30.0,
max_tool_calls=15,
),
model="claude-opus-4-6",
)
return result
if __name__ == "__main__":
answer = run_research_session(
"What is the approximate compound annual growth rate of Python's "
"popularity index on TIOBE from 2015 to 2025? Use web search to "
"find the relevant numbers, then calculate the CAGR."
)
print(answer)
Key Takeaways
- Tool use is a multi-turn protocol: the model requests a tool call; your Python code executes it; you report the result; the model continues. One user question may require several API calls.
- The agentic loop needs safety limits: always set
max_iterations,max_wall_time, andmax_tool_calls. Without limits, a confused model can loop indefinitely and generate unbounded API costs. - Never crash on tool failure: wrap every tool call in
execute_tool_safely. Send the error description back to the model and let it decide how to respond. The model is surprisingly good at recovering from tool errors. - The
ToolRegistrypattern keeps schemas and implementations co-located. Schema is derived from type hints and docstrings, so documentation and the schema stay in sync automatically. - Parallel tool calls are supported by both APIs. Execute them concurrently with
asyncio.gatherto minimise wall-clock latency. - Tool descriptions are prompt engineering: a vague description leads to wrong tool choices. Be explicit about when to use each tool, what format the input should be in, and what the output represents.
- Security is your responsibility: the model controls the arguments your tools receive. Validate all inputs, enforce read-only modes on databases, restrict file access to allowed directories, and cap code execution time.
- Test at the behaviour level: verify that the agent calls the right tools, recovers from errors, and respects limits. Mock both the LLM and the tools for fast unit tests; run real API calls only in integration tests.
Practice Problems
Problem 1: Schema Validator
The ToolRegistry._build_schema method does not handle list[str], dict[str, Any], or Optional[int] type hints. Extend it to:
- Handle
list[str]as{"type": "array", "items": {"type": "string"}} - Handle
dict[str, Any]as{"type": "object"} - Handle
Optional[X](same asX | None) correctly -- the parameter should not be inrequired - Handle
Literal["a", "b", "c"]as{"type": "string", "enum": ["a", "b", "c"]}
Write the extended _build_schema method and include at least 5 test cases covering these new types.
Problem 2: Tool Call Logger
Build a ToolCallLogger wrapper class that:
- Wraps any existing tool implementation
- Records: tool name, arguments, result, execution time (milliseconds), whether it was an error
- Stores records in memory with a configurable max size (circular buffer)
- Exposes a
summary()method showing: total calls per tool, average execution time per tool, error rate per tool - Can be used transparently:
implementations["get_weather"] = logger.wrap("get_weather", original_fn)
Problem 3: Retry with Backoff
Some tools fail transiently (rate limits, network errors). Build a RetryingToolExecutor that:
- Wraps
execute_tool_safely - Retries a tool up to N times on transient errors (detect transient errors by looking for "rate limit", "timeout", "503", "429" in the error message)
- Uses exponential backoff: wait 1s, 2s, 4s between retries
- Never retries logic errors (wrong arguments, access denied)
- Adds metadata to the result:
{"retries": 2, "final_result": ...}
Problem 4: Streaming Agent Loop
The current run_agent returns only after all tool calls are complete. For user-facing applications, you want to stream intermediate status. Build a generator-based agent loop:
def stream_agent_events(
user_message: str, tools, implementations
) -> Iterator[dict]:
# Yield events like:
# {"type": "tool_start", "name": "get_weather", "args": {...}}
# {"type": "tool_result", "name": "get_weather", "result": "..."}
# {"type": "text_delta", "text": "Based on the weather..."}
# {"type": "done", "final_text": "..."}
This allows a frontend to show "Searching the web..." spinners and stream the final answer token by token.
Problem 5: Multi-Agent Orchestration
Build a simple multi-agent system where a "Planner" agent breaks a complex question into sub-tasks, and "Worker" agents (each with different tool sets) execute the sub-tasks in parallel.
The Planner should:
- Receive a complex user question
- Break it into 2-4 independent sub-tasks using a
create_plantool - Return a JSON list of subtasks
The Orchestrator should:
- Run the Planner to get the task list
- Dispatch each subtask to a Worker agent with the relevant tools
- Collect Worker results
- Run a "Synthesiser" agent that combines all results into a final answer
Design the data flow and implement at least the Planner + Orchestrator. You do not need real external tools -- stubs are fine.
